A New Compression Based Index Structure for Efficient Information Retrieval
نویسندگان
چکیده
Finding desired information from large data set is a difficult problem. Information retrieval is concerned with the structure, analysis, organization, storage, searching, and retrieval of information. Index is the main constituent of an IR system. Now a day exponential growth of information makes the index structure large enough affecting the IR system’s quality. So compressing the Index structure is our main contribution in this paper. We compressed the document number in inverted file entries using a new coding technique based on run-length encoding. Our coding mechanism uses a specified code which acts over run-length coding. We experimented and found that our coding mechanism on an average compresses 67.34% more than the other techniques.
منابع مشابه
Improved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملAssessing the Internal Structure of the Ellis Information Retrieval Model in Order to Present the Persian Norm of Web Retrieval Tools
Introduction: Study evaluated the internal structure of Ellis information seeking model in the student community with the aim of presenting the Persian norm. Methods: This is a descriptive-analytical study conducted by cross-sectional survey method in the second semester of the academic year 1399-1400. Population comprise of 280 graduate students at Ahvaz Jundishapur University of Medical Scien...
متن کاملReview of ranked-based and unranked-based metrics for determining the effectiveness of search engines
Purpose: Traditionally, there have many metrics for evaluating the search engine, nevertheless various researchers’ proposed new metrics in recent years. Aware of this new metrics is essential to conduct research on evaluation of the search engine field. So, the purpose of this study was to provide an analysis of important and new metrics for evaluating the search engines. Methodology: This is ...
متن کاملComparative Analysis of Sparse Matrix Algorithms
We evaluate and compare the storage efficiency of different sparse matrix storage formats as index structure for text collection and their corresponding sparse matrixvector multiplication algorithm to perform query processing in information retrieval (IR) application. We show the results of our implementations for several sparse matrix algorithms such as Coordinate Storage (COO), Compressed Spa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1209.5448 شماره
صفحات -
تاریخ انتشار 2012